Goto

Collaborating Authors

 principal component regression




On the number of variables to use in principal component regression

Neural Information Processing Systems

We study least squares linear regression over $N$ uncorrelated Gaussian features that are selected in order of decreasing variance. When the number of selected features $p$ is at most the sample size $n$, the estimator under consideration coincides with the principal component regression estimator; when $p> n$, the estimator is the least $\ell_2$ norm solution over the selected features.


On Robustness of Principal Component Regression

Neural Information Processing Systems

Consider the setting of Linear Regression where the observed response variables, in expectation, are linear functions of the p-dimensional covariates. Then to achieve vanishing prediction error, the number of required samples scales faster than pσ2, where σ2 is a bound on the noise variance. In a high-dimensional setting where p is large but the covariates admit a low-dimensional representation (say r p), then Principal Component Regression (PCR), cf.


A PLS-Integrated LASSO Method with Application in Index Tracking

Tang, Shiqin, Dong, Yining, Qin, S. Joe

arXiv.org Machine Learning

In traditional multivariate data analysis, dimension reduction and regression have been treated as distinct endeavors. Established techniques such as principal component regression (PCR) and partial least squares (PLS) regression traditionally compute latent components as intermediary steps -- although with different underlying criteria -- before proceeding with the regression analysis. In this paper, we introduce an innovative regression methodology named PLS-integrated Lasso (PLS-Lasso) that integrates the concept of dimension reduction directly into the regression process. We present two distinct formulations for PLS-Lasso, denoted as PLS-Lasso-v1 and PLS-Lasso-v2, along with clear and effective algorithms that ensure convergence to global optima. PLS-Lasso-v1 and PLS-Lasso-v2 are compared with Lasso on the task of financial index tracking and show promising results.


Calibrated Principal Component Regression

Wu, Yixuan Florence, Zhu, Yilun, Cao, Lei, Shi, Naichen

arXiv.org Machine Learning

We propose a new method for statistical inference in generalized linear models. In the overparameterized regime, Principal Component Regression (PCR) reduces variance by projecting high-dimensional data to a low-dimensional principal subspace before fitting. However, PCR incurs truncation bias whenever the true regression vector has mass outside the retained principal components (PC). To mitigate the bias, we propose Calibrated Principal Component Regression (CPCR), which first learns a low-variance prior in the PC subspace and then calibrates the model in the original feature space via a centered Tikhonov step. CPCR leverages cross-fitting and controls the truncation bias by softening PCR's hard cutoff. Theoretically, we calculate the out-of-sample risk in the random matrix regime, which shows that CPCR outperforms standard PCR when the regression signal has non-negligible components in low-variance directions. Empirically, CPCR consistently improves prediction across multiple overparameterized problems. The results highlight CPCR's stability and flexibility in modern overparameterized settings.



e96ed478dab8595a7dbda4cbcbee168f-Reviews.html

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper proposes a simple latent factor model for one-shot learning with continuous outputs where very few observations are available. Specifically, it derives risk approximations in an asymptotic regime where the number of training examples is fixed and the number of features in the X space diverges. Based on principal component regression (PCR) estimator, two estimators including the bias-corrected estimator and the so-called oracle estimator are proposed and the bounds for the risks of these estimators are derived. These bounds provide insights into the significance of various parameters relevant to one-shot learning.



On Robustness of Principal Component Regression: Author Response

Neural Information Processing Systems

We begin by thanking all reviewers for their extremely encouraging and helpful responses. We agree that the fact we do PCR on both the training and testing covariates should be more explicitly placed in the context of transductive semi-supervised learning. We have strived to interpret our major theorem results (Thm 4.2 & Thm 5.1) by: (i) providing examples of natural generating Proposition 4.2, should be tight). Their empirical results support our theoretical guarantees.